Data Visualization · FH Potsdam · Summer 2023

Tutorial 9: Geovisualization¶

In this last instalment of the data visualization tutorials we will be analyzing and visualizing geographic data; i.e., data that refers to geospatial entities. Geospatial entities can, for example, be particular places such as schools and libraries or political boundaries of cities or countries. Of course, this tutorial only scratches the surface. Consider this as a teaser into the world of geovisualization, which in itself has become a branch of research and practice at the intersection of geography and visualization. We will only touch on a few basic steps to get your feet wet and hands dirty.

🛒 1. Prepare¶

As you come to expect by now we first assemble our tools and then prepare the data.

Install packages¶

For this tutorial we will continue to rely on Altair and Pandas, but add a few packages for geospatial analysis, such as GeoPandas, which will help us to work with DataFrames that contain spatial entities to carry out geometric analysis on them. As before, the additional packages are also listed in the file requirements.txt:

In [ ]:
import altair as alt
import pandas as pd
import geopandas as gpd

To access the data of the OpenStreetMap, we will use methods from the handy package OSMPythonTools and we will draw from GeoPy, which will help us turn addresses into geographic coordinates—these two will be imported later in this notebook.

Once we have the tools assembled, we can get started working with geospatial data. There are actually plenty of formats used to record geospatial data, but GeoJSON has become an important standard for exchanging geospatial data on the web. However, please note that GeoPandas can actually load many other vector-based data formats used in digital cartography, such as shapefiles and GeoPackage.

Import GeoJSON¶

Suppose we would like to get the geographic boundaries of Potsdam's districts, which happen to be published on Potsdam's Open Data Portal. Akin to how we would read a JSON file with Pandas, we can also use read_file() provided by GeoPandas simply by passing the URL to the data set of interest and get a geographic DataFrame back:

In [ ]:
districts = gpd.read_file("https://opendata.potsdam.de/explore/dataset/statistische-bezirke-in-potsdam/download/?format=geojson")

If you replace geojson with shp at the end of the above URL, you can also load this data in a shapefile format. Either way the data is going to be loaded into the data structure of a GeoDataFrame. The main difference with a regular Pandas DataFrame is that a GeoDataFrame features a geometry column, which is a geoseries containing the points, paths, and polygons for each row. For example, if each row represents one district, the respective geometries would probably contain the geospatial boundaries…

✏️ Are you curious what the districts DataFrame actually looks like? Take a look at it with the methods you know by now:

In [ ]:
 

Geographically speaking, the districts are defined by their geographic shapes, which are represented as polygons, each of which is a list of tuples of geographic coordinates. Next we add information about schools in Potsdam:

In [ ]:
schools = gpd.read_file("https://opendata.potsdam.de/explore/dataset/schulen/download/?format=geojson")

✏️ Have a look at the schools as well, and compare the contents of the geometry columns in schools and districts. Do you notice anything?

In [ ]:
 

Query OpenStreetMap¶

OpenStreetMap (OSM) is "a collaborative project to create a free editable map of the world". As such it has millions of contributing users who have been collecting, updating and refining map data for over 15 years, which has generated a vastly comprehensive source of geographic data. It is by no means complete — whatever this would mean — but it is an impressively large geographic database and, of course, a map in itself, too.

To get a list of libraries in Potsdam (according to the OSM), we first need to find the right Potsdam. For this we use the geocoding powers of OSM through the Nominatim search service:

In [ ]:
from OSMPythonTools.nominatim import Nominatim
nominatim = Nominatim()
city = nominatim.query('Potsdam, Germany')
city.areaId()
Out[ ]:
3600062369

OpenStreetMap has its own kind of query language, which is quite compact and can also be a source for errors. To make query formulation easier, you can either use the web interface overpass turbo or the overpassQueryBuilder, which provides access to the main parameters:

In [ ]:
from OSMPythonTools.overpass import overpassQueryBuilder

library_query = overpassQueryBuilder(
    area=city.areaId(), # the query can be contrained by an area of an item
    elementType='node', # which are points (OSM also has ways and relations)
    # the selector in the next line is really the heart of the query:
    selector='"amenity"="library"', # we're looking for libraries
    out='body', # body indicates that we want the data, not just the count
    includeGeometry=True # and we want the geometric information, too
)

library_query
Out[ ]:
'area(3600062369)->.searchArea;(node["amenity"="library"](area.searchArea);); out body geom;'

The output of the above cell is the compact version of the query, which is carried out in the next step:

In [ ]:
from OSMPythonTools.overpass import Overpass
overpass = Overpass()

lib_data = overpass.query(library_query)

The variable lib_data now already contains the result from the query against OSM. Let's have a look at it. With elements() we can access the retrieved points. Let's take a look at the first entry:

In [ ]:
lib_data.elements()[0].tags()
Out[ ]:
{'addr:city': 'Potsdam',
 'addr:country': 'DE',
 'addr:housenumber': '5',
 'addr:postcode': '14469',
 'addr:street': 'Kiepenheuerallee',
 'amenity': 'library',
 'contact:email': 'bibliothek@fh-potsdam.de',
 'contact:fax': '+49 331 580 2229',
 'contact:phone': '+49 331 580 2211',
 'contact:website': 'https://www.fh-potsdam.de/informieren/organisation/wiss-einrichtungen/bibliothek/bibliothek-news/',
 'name': 'Hochschulbibliothek',
 'opening_hours': 'Mo-Fr 09:00-19:00; Sa 09:00-14:30; PH,Su off; 2019 Dec 21-2020 Jan 01 off',
 'opening_hours:url': 'https://www.fh-potsdam.de/informieren/organisation/wiss-einrichtungen/bibliothek/wir-ueber-uns/oeffnungszeiten/',
 'operator': 'Fachhochschule Potsdam',
 'operator:type': 'public',
 'operator:wikidata': 'Q896706',
 'operator:wikipedia': 'de:Fachhochschule Potsdam',
 'ref:isil': 'DE-525',
 'wheelchair': 'limited'}

Similarly, we can also access the geometry, which in this case is just a point:

In [ ]:
lib_data.elements()[0].geometry()
Out[ ]:
{"coordinates": [13.051358, 52.41372], "type": "Point"}

Next, we use the compact form of a list comprehension to extract the libraries' names and coordinates:

In [ ]:
libraries = [ (lib.tag("name"), lib.geometry() ) for lib in lib_data.elements()]

… which we turn into a GeoDataFrame. By naming the second column geometry we indicate towards GeoPandas to interpret the coordinates as geographic locations:

In [ ]:
libraries = gpd.GeoDataFrame(libraries, columns = ['name', 'geometry'])

Let's repeat the process to retrieve Berlin's trees (as recorded by the OSM community):

In [ ]:
# 1. prepare query (and directly include the location lookup)
tree_query = overpassQueryBuilder(
    area=nominatim.query('Berlin, Germany').areaId(),
    elementType='node',
    selector='"natural"="tree"', 
    out='body', 
    includeGeometry=True
)

# 2. execute query (and give it a bit more time to finish)
tree_data = overpass.query(tree_query, timeout=60)

# 3. get ids and coordinates of trees
tree_locs = [ (tree.id(), tree.geometry()) for tree in tree_data.elements()]

# 4. create GeoDataFrame
trees = gpd.GeoDataFrame(tree_locs, columns=["id", "geometry"])

trees.head()
Out[ ]:
id geometry
0 21487172 POINT (13.35177 52.51431)
1 26908663 POINT (13.34627 52.47173)
2 27306554 POINT (13.30086 52.52252)
3 27306733 POINT (13.30062 52.52235)
4 30429119 POINT (13.30101 52.52255)

What are you interested in? You may want to consult the Overpass API or play around with overpass turbo.

✏️ Copy the above code into the cell below and adjust the query and variable names according to your interest:

In [ ]:
 

📍 2. Process¶

One important processing step is turning addresses into geolocations.

Geocoding addresses¶

A typical challenge before actually visualizing geographic data is to extract the geographic coordinates of items of interest (may they be trees or libraries). When loading GeoJSON files or querying OpenStreetMap, the geometries are already included. However, oftentimes we may only have street addresses or the names of geographic entities such as cities or points of interests, which cannot be directly used to derive positions on a map. Yet, intuitively speaking, names of places and street addresses are unique ways of identifying locations. To turn such geographic strings into geographic tuples we rely on geocoding. In short: geocoding translates an address or place name into geographic coordinates.

There are plenty of commercial geocoders out there, but for the purpose of this tutorial we simply use OpenStreetMap's Nominatim service via the handy GeoPy package. (We could also stick to the OSMPythonTools that we already used above, but GeoPy has some handy ways of carrying out multiple geocoding steps in a batch.)

In [ ]:
# import Nominatim as geopy geocoder
from geopy.geocoders import Nominatim

# register a custom user agent (commercial services may also require an API key)
geocoder = Nominatim(user_agent="Data Visualization Tutorial 2023 · FH Potsdam")

Let's start with an address that students of FH Potsdam might be familiar with:

In [ ]:
loc = geocoder.geocode("Kiepenheuerallee 5, 14469 Potsdam")
print(loc)
Campus Fachhochschule Potsdam, 5, Kiepenheuerallee, Bornstedt, Potsdam Nord, Potsdam, Brandenburg, 14469, Deutschland

We can access the geographic coordinates one by one:

In [ ]:
print((loc.latitude, loc.longitude))
(52.4123583, 13.050748548573427)

Note that we can also use a name of a place; however, in this case the coordinates do not refer to the street address, but the center of the place:

In [ ]:
loc = geocoder.geocode("Fachhochschule Potsdam")
print((loc.latitude, loc.longitude))
(52.4121432, 13.0507812)

✏️ Give the geocoder a try and issue a geocoding request for an address or placename of your choice:

In [ ]:
 

Let's proceed with a dataset containing multiple places. Do you remember the childcare dataset we loaded in the data wrangling tutorial? The dataset actually did not include geospatial coordinates, but street addresses! Let's load the CSV again into a DataFrame:

In [ ]:
kitas = pd.read_csv("https://opendata.potsdam.de/explore/dataset/kitaboerse-20161108/download/", sep=";")

The hausnummer column contains some non-numerical items such as months (for no apparent reason). This will cause errors later during geocoding. Below we extract the first number encountered in the hausnummer cells. Some kitas have multiple house numbers, but for the purpose of this tutorial one will suffice.

In [ ]:
# the parameter of extract is a regular expression that matches the first set of digits 
kitas.hausnummer = kitas.hausnummer.str.extract('(\d+)')

# in some cases the value is not a number, which we will replace with zeros
kitas.hausnummer = kitas.hausnummer.fillna(0)

Geocoding can take a bit of time, so we limit ourselves to just a few random entries here:

In [ ]:
kitas = kitas.sample(20)

✏️ If you have a bit of time, increase the above number or simply comment out the line to analyze all kitas in Potsdam

Note that the address information is spread across three columns: strasse, hausnummer, postleitzahl.

We will turn them into one column, with which we query the geocoder. For the purpose of this tutorial, we keep the remaining data at a minimum and thus only keep the names of the kitas and their capacities:

In [ ]:
kitas
Out[ ]:
name_der_kindertagesbetreuungseinrichtung stand_vom barrierefreie_einrichtung kinderkrippe_0_3_j tagespflege_0_3_j padagogisch_begleitete_spielgruppe_0_3_j kindergarten_3_j_schuleintritt hort_ab_schuleintritt andere_kinderbetreuung_ab_3_klasse strasse ... abweichende_offnungszeiten ubernachtung_moglich schliesstage_von_bis fruhstuck mittag vesper abendessen versorgungsart link_zum_bereich_kinder_und_jugend_der_landeshauptstadt_potsdam link_zu_anmeldeinformationen_der_einrichtung
13 Turmspatzen, AWO, Kita 01.09.2016 Ja Ja Nein Nein Ja Ja Nein Kaiser-Friedrich-Str. ... NaN Nein keine Ja Ja Ja Nein Mischversorgung http://www.potsdam.de/kita-tipp http://www.awo-potsdam.de/einrichtungen-und-di...
86 Farbenspiel, Kita 01.11.2015 Nein Ja Nein Nein Ja Nein Nein Peter-Huchel-Str. ... NaN Nein NaN Ja Ja Ja Nein keine Angabe http://www.potsdam.de/kita-tipp http://anmeldung.die-kinderwelt.com/
132 Spielgruppe Waldstadt 08.11.2016 Nein Nein Nein Ja Nein Nein Nein Ginsterweg ... NaN Nein NaN Ja Ja Ja Nein Eigenversorgung am Standort http://www.potsdam.de/kita-tipp http://pbhev.de/?page_id=21
93 Hort der evangelischen Grundschule Potsdam 09.09.2016 Nein Nein Nein Nein Nein Ja Nein Große Weinmeisterstr. ... Ferien: 8 bis 16 Uhr Nein NaN Nein Nein Nein Nein keine Angabe http://www.potsdam.de/kita-tipp http://www.hoffbauer-bildung.de/grundschule-po...
106 Nimmerland, Hort 08.09.2016 Nein Nein Nein Nein Nein Ja Nein Karl-Marx-Str. ... NaN Nein Weihnachtsferien Nein Ja Ja Nein Eigenversorgung geliefert http://www.potsdam.de/kita-tipp http://www.elternverein-zwergenland.de/wp-cont...
9 Hort an der Regenbogenschule, Fahrland 08.11.2016 Ja Nein Nein Nein Ja Ja Nein Ketziner Str. ... NaN Nein zwischen Weihnachten und Silvester, am Freitag... Nein Ja Ja Nein Fremdversorgung http://www.potsdam.de/kita-tipp http://www.treffpunkt-fahrland.de/index.php?op...
83 Inselmäuse, AWO, Kita 01.11.2015 Nein Ja Nein Nein Ja Nein Nein Burgstr. ... NaN Nein Zwischen Weihnachten und Neujahr. Einzelne Sch... Ja Ja Ja Nein Eigenversorgung geliefert http://www.potsdam.de/kita-tipp http://www.awo-potsdam.de/einrichtungen-und-di...
113 Havelsprotten, AWO, Hort 01.11.2015 Nein Nein Nein Nein Nein Ja Nein Burgstr. ... NaN Nein NaN Nein Nein Ja Nein Eigenversorgung geliefert http://www.potsdam.de/kita-tipp http://www.awo-potsdam.de/einrichtungen-und-di...
91 Evangelische Kita Hoffkids 13.09.2016 Nein Ja Nein Nein Ja Nein Nein Alt Nowawes ... Fr 7.45 bis 16 Uhr Nein 25 Tage in den Schulferien Ja Ja Ja Nein Fremdversorgung http://www.potsdam.de/kita-tipp http://www.hoffbauer-bildung.de/kita-hoffkids/...
37 Treffpunkt Freizeit, AKi 01.11.2015 Nein Nein Nein Nein Nein Ja Ja Am Neuen Garten ... NaN Nein NaN Nein Nein Nein Nein keine Angabe http://www.potsdam.de/kita-tipp http://www.treffpunktfreizeit.de
77 Kinderclub Einsteinkids, AKi 21.09.2016 Nein Nein Nein Nein Nein Nein Ja Knobelsdorffstr. ... NaN Nein NaN Nein Nein Nein Nein keine Angabe http://www.potsdam.de/kita-tipp https://www.ejf.de/kinder-und-jugendhilfe/alle...
66 first steps Deutsch-Englischer Kindergarten 01.03.2016 Nein Ja Nein Nein Ja Nein Nein Ludwig-Richter-Str. ... Bei Bedarf Änderungen in Absprache möglich. Nein NaN Ja Ja Ja Nein Mischversorgung http://www.potsdam.de/kita-tipp http://www.firststeps-kita.de/aufnahme/anmeldu...
127 Tönemaler, Kita 01.09.2016 Nein Ja Nein Nein Ja Nein Nein David-Gilly-Str. ... NaN Nein NaN Ja Ja Ja Nein Mischversorgung http://www.potsdam.de/kita-tipp http://www.gfb-potsdam.de/CMS/index.php?option...
30 Elterninitiativkita Butzemannhaus e.V. 01.09.2016 Nein Ja Nein Nein Ja Nein Nein Seepromenade ... Fr bis 16:30 Uhr Nein 06.05.16, Brückentag, 21.12.16- 03.01.2017, We... Ja Ja Ja Nein Eigenversorgung am Standort http://www.potsdam.de/kita-tipp http://butzemannhaus.grossglienicke.de/index.html
8 Kinderhaus Fridolin 01.11.2015 Nein Ja Nein Nein Ja Nein Nein Alleestr. ... NaN Nein 5 Fortbildungstage Ja Ja Ja Ja Mischversorgung http://www.potsdam.de/kita-tipp www.fidl.de
5 Oberlin-Kindertagesstätte Eiche 01.02.2016 Nein Ja Nein Nein Ja Ja Nein Kaiser-Friedrich-Str. ... Hort: 6.30 bis 7:45 Uhr und 11.30 bis 18 Uhr Nein NaN Ja Ja Ja Nein Eigenversorgung am Standort http://www.potsdam.de/kita-tipp NaN
55 Kids Company, Kita 01.11.2015 Nein Ja Nein Nein Ja Nein Nein Potsdamer Str. ... NaN Nein 06.11.2015, 24.12.2015-01.01.2016, 06.05.2016,... Ja Ja Ja Nein Eigenversorgung geliefert http://www.potsdam.de/kita-tipp http://www.kita-kidscompany.de/index.php/anmel...
135 Tagespflegepersonen FidL - Frauen in der Leben... 01.11.2015 Nein Nein Ja Nein Nein Nein Nein Alleestr. ... Terminvereinbarung per Mail info@fidl.de oder ... Nein NaN Ja Ja Ja Nein keine Angabe http://www.potsdam.de/kita-tipp http://fidl.de/kinderbetreuung/tagespflege
78 Bergmännchen, Kita 08.11.2016 Nein Ja Nein Nein Ja Nein Nein Charlottenstr. ... NaN Nein NaN Ja Ja Ja Nein keine Angabe http://www.potsdam.de/kita-tipp http://www.hoffbauer-bildung.de/kita-bergmaenn...
90 Firlefanz, Kita 01.11.2015 Ja Ja Nein Nein Ja Nein Nein Nedlitzer Holz ... NaN Nein 06.11.2015 pädagogischer Tag; 24.12.2015 - 01.... Ja Ja Ja Nein Eigenversorgung geliefert http://www.potsdam.de/kita-tipp http://www.kita-firlefanz.de/resources/aufnahm...

20 rows × 49 columns

In [ ]:
# names of the childcare places
names = kitas["name_der_kindertagesbetreuungseinrichtung"]

# the capacity of the kitas; which we turn into integers
capac = kitas["platze_unbefristet"].fillna(0).astype(int)

# columns containing address information
addr = ['strasse', 'hausnummer', 'postleitzahl']

# we join the values in the three address-related columns for each row
addresses = kitas[addr].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)

# the dataframe we will use
kitas = pd.DataFrame({'name': names, 'capacity': capac, 'address': addresses})
kitas
Out[ ]:
name capacity address
13 Turmspatzen, AWO, Kita 205 Kaiser-Friedrich-Str. 15 14469 Potsdam
86 Farbenspiel, Kita 120 Peter-Huchel-Str. 1 14469 Potsdam
132 Spielgruppe Waldstadt 15 Ginsterweg 3 14478 Potsdam
93 Hort der evangelischen Grundschule Potsdam 250 Große Weinmeisterstr. 49 14469 Potsdam
106 Nimmerland, Hort 30 Karl-Marx-Str. 72 14482 Potsdam
9 Hort an der Regenbogenschule, Fahrland 204 Ketziner Str. 31 14476 Potsdam
83 Inselmäuse, AWO, Kita 63 Burgstr. 23 14467 Potsdam
113 Havelsprotten, AWO, Hort 240 Burgstr. 23 14467 Potsdam
91 Evangelische Kita Hoffkids 23 Alt Nowawes 94 14482 Potsdam
37 Treffpunkt Freizeit, AKi 25 Am Neuen Garten 64 14469 Potsdam
77 Kinderclub Einsteinkids, AKi 29 Knobelsdorffstr. 7 14471 Potsdam
66 first steps Deutsch-Englischer Kindergarten 20 Ludwig-Richter-Str. 9 14467 Potsdam
127 Tönemaler, Kita 84 David-Gilly-Str. 3 14469 Potsdam
30 Elterninitiativkita Butzemannhaus e.V. 60 Seepromenade 54 14476 Potsdam
8 Kinderhaus Fridolin 78 Alleestr. 11 14469 Potsdam
5 Oberlin-Kindertagesstätte Eiche 123 Kaiser-Friedrich-Str. 106 14469 Potsdam
55 Kids Company, Kita 71 Potsdamer Str. 63 14469 Potsdam
135 Tagespflegepersonen FidL - Frauen in der Leben... 90 Alleestr. 1 14469 Potsdam
78 Bergmännchen, Kita 109 Charlottenstr. 72 14467 Potsdam
90 Firlefanz, Kita 54 Nedlitzer Holz 12 14469 Potsdam

Now let's turn addresses into geometries! For each cell in the address column, a query against OSM will be triggered; to spread out the load we use a RateLimiter provided by GeoPy:

In [ ]:
from geopy.extra.rate_limiter import RateLimiter

# add a delay of one second between each geocoding request
geocode = RateLimiter(geocoder.geocode, min_delay_seconds=1)

Next we invoke the geocoder and apply it to the address column (depending on how many entries are to be geocoded, this can take a while):

In [ ]:
# apply geocoding to address column; store responses in location column 
kitas['location'] = kitas['address'].apply(geocode)

There might be some kitas that have not location information, i.e., for which the geocoder was not able to identify latitude and longitude. We only keep those rows that have location information, i.e., that are notnull():

In [ ]:
kitas = kitas[kitas['location'].notnull()]

After that we use one list comprehension to extract latitudes and longitudes from the locations column, which we will then use to transform the DataFrame into a GeoDataFrame featuring its own geometry column:

In [ ]:
# create empty columns for coordinates
kitas.loc[:, "lat"] = None
kitas.loc[:, "lon"] = None

# extract lat and lon from locations via one list comprehensions
kitas.loc[:, ['lat', 'lon']] = [ (loc.latitude, loc.longitude) for loc in kitas['location'] ]

# # create GeoDataFrame, pointing explicitly to lon and lat columns
kitas = gpd.GeoDataFrame(kitas, geometry=gpd.points_from_xy(kitas.lon, kitas.lat))

# # remove superfluous columns that are not needed anymore
kitas = kitas.drop(columns=['location', 'lat', 'lon'])

kitas
Out[ ]:
name capacity address geometry
13 Turmspatzen, AWO, Kita 205 Kaiser-Friedrich-Str. 15 14469 Potsdam POINT (12.99432 52.40529)
86 Farbenspiel, Kita 120 Peter-Huchel-Str. 1 14469 Potsdam POINT (13.05312 52.42447)
132 Spielgruppe Waldstadt 15 Ginsterweg 3 14478 Potsdam POINT (13.09033 52.36659)
93 Hort der evangelischen Grundschule Potsdam 250 Große Weinmeisterstr. 49 14469 Potsdam POINT (13.06226 52.41541)
106 Nimmerland, Hort 30 Karl-Marx-Str. 72 14482 Potsdam POINT (13.12186 52.39552)
9 Hort an der Regenbogenschule, Fahrland 204 Ketziner Str. 31 14476 Potsdam POINT (13.01783 52.46601)
83 Inselmäuse, AWO, Kita 63 Burgstr. 23 14467 Potsdam POINT (13.06774 52.39741)
113 Havelsprotten, AWO, Hort 240 Burgstr. 23 14467 Potsdam POINT (13.06774 52.39741)
91 Evangelische Kita Hoffkids 23 Alt Nowawes 94 14482 Potsdam POINT (13.09017 52.39722)
37 Treffpunkt Freizeit, AKi 25 Am Neuen Garten 64 14469 Potsdam POINT (13.06460 52.40815)
77 Kinderclub Einsteinkids, AKi 29 Knobelsdorffstr. 7 14471 Potsdam POINT (13.02383 52.38774)
66 first steps Deutsch-Englischer Kindergarten 20 Ludwig-Richter-Str. 9 14467 Potsdam POINT (13.07678 52.41135)
127 Tönemaler, Kita 84 David-Gilly-Str. 3 14469 Potsdam POINT (13.03723 52.41535)
30 Elterninitiativkita Butzemannhaus e.V. 60 Seepromenade 54 14476 Potsdam POINT (13.10524 52.46002)
8 Kinderhaus Fridolin 78 Alleestr. 11 14469 Potsdam POINT (13.05998 52.40949)
5 Oberlin-Kindertagesstätte Eiche 123 Kaiser-Friedrich-Str. 106 14469 Potsdam POINT (12.99447 52.40504)
55 Kids Company, Kita 71 Potsdamer Str. 63 14469 Potsdam POINT (13.00990 52.42016)
135 Tagespflegepersonen FidL - Frauen in der Leben... 90 Alleestr. 1 14469 Potsdam POINT (13.05937 52.40886)
78 Bergmännchen, Kita 109 Charlottenstr. 72 14467 Potsdam POINT (13.06582 52.40118)
90 Firlefanz, Kita 54 Nedlitzer Holz 12 14469 Potsdam POINT (13.05218 52.42579)

🗺 3. Present¶

When we have geospatial data readily available as GeoDataFrames, we can now map them with Altair.

(There are other mapping libraries for Python, such as Folium, that provide additional functionalities. Altair's geovis features are basic, but do provide some variety of techniques and have the benefit of working consistently with the other chart types we covered.)

Markers on maps¶

A simple start is placing locations on a base map and adding a bit of further information via tooltips. Let's do this with Potsdam's schools! First, we can have another look at the attributes:

In [ ]:
schools.head(5)
Out[ ]:
schulnum_1 schulnumme ort trager y_e89_rbs plz sozialraum planungsra status schulname schulform strasse standort x_e89_rbs geometry
0 105478 12 Potsdam öffentlich 5.806710e+06 14471 III 303 Aktiv Gerhart-Hauptmann-Grundschule Potsdam G Carl-von-Ossietzky-Straße 37 Hauptstandort 366239.9998 POINT (13.03417 52.39427)
1 105491 16 Potsdam öffentlich 5.806730e+06 14482 IV 402 Aktiv Grundschule Bruno H. Bürgel G Karl-Liebknecht-Straße 29 Hauptstandort 370309.9999 POINT (13.09394 52.39543)
2 106770 17 Potsdam öffentlich 5.810276e+06 14469 II 201 Aktiv Grundschule Am Jungfernsee G Fritz-von-der-Lancken-Straße 2 Hauptstandort 367861.5560 POINT (13.05657 52.42671)
3 401250 18 Potsdam öffentlich 5.802902e+06 14478 VI 604 Aktiv Fröbelschule Schule mit dem sonderpädagogische... F Zum Teufelssee 6 Hauptstandort 369969.3860 POINT (13.09042 52.36096)
4 105466 20 Potsdam öffentlich 5.803430e+06 14480 V 502 Aktiv Grundschule Am Priesterweg G Oskar-Meßter-Straße 4-6 Hauptstandort 372949.9999 POINT (13.13397 52.36640)

This gives us plenty of aspects to visualize.

We will now create a simple map with markers in the form of an Altair chart consisting of two layers:

  1. The districts form the lower layer representing their boundaries and the overall geographic shape of Potsdam
  2. The schools are the points of interests that are displayed on top

When putting the two layers together they should actually refer to the same geographic location to make sense. Here the districts and schools both refer to Potsdam. Also note that the order when the charts are added together determines the vertical order: first the basemap and then the markers on top.

In [ ]:
# 1.  mark_geoshape transparently uses the geometry column
basemap = alt.Chart(districts).mark_geoshape(
    # add some styling to reduce the salience of the basemap
    fill="lightgray", stroke="darkgray"
).properties(width=600, height=600)

# 2.  we use mark_circle to have more control over the visual variables
markers = alt.Chart(schools).mark_circle(opacity=1).encode(
    # point latitude & longitude to coordinates in the geometry column
    longitude='geometry.coordinates[0]:Q',
    latitude='geometry.coordinates[1]:Q',
    tooltip=['schulname', 'strasse', 'trager'],
)

# combine the two layers 
basemap + markers
Out[ ]:

✏️ How about changing the color of the dots according to trager or schulform?

Graduated symbol maps¶

This technique adjusts the visual features of markers to encode quantitative data dimensions. For example, we can use varying sizes of circles to represent the capacities of Potsdam's kitas. We use the same two-layer structure we used above:

In [ ]:
basemap = alt.Chart(districts).mark_geoshape(
    fill="lightgray", stroke="darkgray"
).properties(width=600, height=600)

markers = alt.Chart(kitas).mark_circle(opacity=1).encode(
    longitude='geometry.coordinates[0]:Q',    
    latitude='geometry.coordinates[1]:Q',    
    tooltip=['name:N', 'address:N', 'capacity:Q'],
    size="capacity:Q"
)

basemap + markers
Out[ ]:

Dot density maps¶

When we are dealing with thousands of elements, we are reaching perceptual and technical limitations. One way to mitigate the technical limitations is to take a sample of the elements, large enough to see overall patterns. This is what we are now doing with Berlin's trees, of which there are far too many to display individually, however, a sample might still be informative:

In [ ]:
# by default Altair handles a max number of 5000 items only
# the following line disables this limitation; read more here:
# https://altair-viz.github.io/user_guide/large_datasets.html
alt.data_transformers.disable_max_rows()

# to not overburden altair, we're passing a sample of 10k trees
treemap = alt.Chart(trees.sample(n=10000)).mark_circle(
    # reduce the visual presence of each element
    size=5,
    # with a low dot opacity we can use overplotting to indicate densities
    opacity=.25,
    # a natural choice
    color="green"
).encode(
    longitude='geometry.coordinates[0]:Q', 
    latitude='geometry.coordinates[1]:Q'    
).properties(width=600, height=600)

treemap
Out[ ]:

✏️ Add a base layer underneath with Berlin's districts; the Technologiestiftung Berlin offers spatial data in various shapes and sizes including for district boundaries. Hint: you might have to flip the winding order of the geometries.

Choropleth maps¶

Finally, let's create the geovisualization that uses the fill color of geospatial shapes to encode quantitative data. To illustrate this, we will visualize the population densities around the world. We will use area and population information from GeoNames and get the geographic shapes of countries from DataHub.

In [ ]:
# load country data from geonames 
geonames = pd.read_csv("https://www.geonames.org/countryInfoCSV", sep='\t')
# select four columns
geonames = geonames[['name', 'iso alpha3', 'areaInSqKm', 'population']]
# set index to country code
geonames = geonames.set_index("iso alpha3")

geonames.head()
Out[ ]:
name areaInSqKm population
iso alpha3
AND Andorra 468.0 77006
ARE United Arab Emirates 82880.0 9630959
AFG Afghanistan 647500.0 37172386
ATG Antigua and Barbuda 443.0 96286
AIA Anguilla 102.0 13254

Next we collect the geographic boundaries and simplify them a bit, as they have more detail than what we need here:

In [ ]:
# load country's polygons from datahub
polygons = gpd.read_file("https://datahub.io/core/geo-countries/r/countries.geojson")

# remove country names, as we have them already
polygons = polygons.drop(columns=["ADMIN"])
# set index to country code
polygons = polygons.set_index("ISO_A3")
# reduce the complexity of the shapes
polygons.geometry = polygons.geometry.simplify(.1)

polygons.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 255 entries, ABW to ZWE
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   geometry  255 non-null    geometry
dtypes: geometry(1)
memory usage: 12.1+ KB

As both DataFrames use the three-letter country codes as indices, we can join them like this:

In [ ]:
# inner means that we keep only those countries
# for which we have geometric and attribute data
countries = polygons.join(geonames, how='inner')

countries.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 238 entries, ABW to ZWE
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   geometry    238 non-null    geometry
 1   name        238 non-null    object  
 2   areaInSqKm  238 non-null    float64 
 3   population  238 non-null    int64   
dtypes: float64(1), geometry(1), int64(1), object(1)
memory usage: 17.4+ KB

Visualizing area or population in a choropleth map, arguably, makes little sense. So let's compute population densities:

In [ ]:
countries["density"] = countries["population"] / countries["areaInSqKm"]

Keep only those countries with valid density value and turn these densities into integers:

In [ ]:
countries = countries[countries['density'].notna()]
countries.density = countries.density.round(0).astype(int)

There is one ‘country’ that is not really one, which is Antarctica. We'll remove this from the list here.

In [ ]:
countries = countries.drop("ATA")

One last manipulation: Polygons are drawn in a specific "winding order" (clockwise or counter-clockwise) in order to distinguish between polygons that go outwards or inwards. Altair doesn't follow the widely used conventions, so we need to turn everything around.

In [ ]:
from shapely.ops import orient
countries.geometry = countries.geometry.apply(orient, args=(-1,))

Finally, we draw the chart using Altair's mark_geoshape() method. The distribution of densities is highly skewed, due to very small countries with relatively high population numbers, such as Monaco. To spread out the low and high density values we use a logarithmic scale and set the domain between 1 and 1000. Note that the domain has to end in a multiple of the base, which is by default 10.

In [ ]:
alt.Chart(countries).mark_geoshape().encode(
    color=alt.Color('density', scale=alt.Scale(type="log", domain=[1,1000] )),
    tooltip=['name', 'areaInSqKm', 'population', 'density']
).project(
  # enter different projection here
).properties(
    width=750,
    height=500
)
Out[ ]:

The map is shown in the default Mercator projection, which particularly distorts the area sizes of North America, Europe and Russia in contrast to Africa, Southern Asia and parts of South America.

✏️ Change the projection used above to one that does not distort area sizes as much (see this list for options).

Sources¶

Tutorials & Documentation

  • Specifying Geospatial Data in Altair
  • GeoPandas
  • OSMPythonTools
  • GeoPy

Data

  • Potsdam Open Data-Portal
  • OpenStreetMap
  • GeoNames
  • DataHub